Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix table listing in Iceberg to skip non-Iceberg tables #1354

Merged
merged 3 commits into from
Sep 11, 2019

Conversation

lxynov
Copy link
Member

@lxynov lxynov commented Aug 21, 2019

#1324

Tests done:

  1. mvn clean install builds successfully in presto-hive directory.
  2. Introduced TestIcebergMetadataListing. mvn clean install builds successfully in presto-iceberg directory.
  3. End-to-end tests look good.

@cla-bot cla-bot bot added the cla-signed label Aug 21, 2019
@@ -176,7 +178,7 @@ public ConnectorTableMetadata getTableMetadata(ConnectorSession session, Connect
return schemaName.map(Collections::singletonList)
.orElseGet(metastore::getAllDatabases)
.stream()
.flatMap(schema -> metastore.getAllTables(schema).stream()
.flatMap(schema -> metastore.getTablesMatchingParameter(schema, TABLE_TYPE_PROP, ICEBERG_TABLE_TYPE_VALUE).stream()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the match case-sensitive or case-insensitive?

On our side we do case-insensitive, file metastore it's case-sensitive and HMS thrift call may be case-sensitive (or it may actually depend on hms backend db collation).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@electrum thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi @linxingyuan1102 The change introduces the problematic behavior as AFAIK case-sensitivity depends on HMS backend db collation and for derby db it is case-sensitive. This leads to iceberg tables that are created in spark using an upper case 'ICEBERG' type become hidden in show tables while still valid and usable in queries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vrozov thanks for your comment. As you see, this was assumed as probable.
May I suggest to move discussion to #iceberg slack channel?
Otherwise a conversation under a closed PR may escape attention of some people.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi Both #iceberg slack channel or new issue #1592 is fine with me. I commented here, to make it easier to refer to the root cause of the problem.

@lxynov
Copy link
Member Author

lxynov commented Aug 23, 2019

@findepi @sopel39
Thanks for the review. I've updated the PR.

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM form me. Let's wait for @findepi review

@lxynov
Copy link
Member Author

lxynov commented Aug 26, 2019

@findepi Thanks for the review. Comments addressed.

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Minor comments + a question to David.

@lxynov
Copy link
Member Author

lxynov commented Aug 28, 2019

Thanks. Comments addressed.

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@linxingyuan1102 thanks!

@electrum do you want to review changes in FileHiveMetastore.java?

@@ -119,6 +120,9 @@
private volatile boolean metastoreKnownToSupportTableParamEqualsPredicate;
private volatile boolean metastoreKnownToSupportTableParamLikePredicate;

private static final Pattern TABLE_PARAMETER_SAFE_KEY_PATTERN = Pattern.compile("^[a-zA-Z_]+$");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is these a reference to what are the safe values for HMS?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is one. See #1354 (comment).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phd3 this is documented in the code (where the constant is actually used, so there is a little bit more context).
Could you please check that explanation there? If it's not sufficient for a reader, we should improve it.

@@ -86,6 +87,7 @@
private final LoadingCache<String, List<String>> databaseNamesCache;
private final LoadingCache<HiveTableName, Optional<Table>> tableCache;
private final LoadingCache<String, List<String>> tableNamesCache;
private final LoadingCache<List<String>, List<String>> tablesWithParameterCache;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather create a different class for <databaseName, parameterKey, parameterValue> than using a list to key the cache.

Copy link
Member

@phd3 phd3 Aug 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can name it something liketableNamesCacheForPattern for consistency with tableNamesCache. both of them fetch tables for a database, the later one takes a filter into account.

Copy link
Member Author

@lxynov lxynov Sep 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can name it something liketableNamesCacheForPattern for consistency with tableNamesCache. both of them fetch tables for a database, the later one takes a filter into account.

"Pattern" might not be very accurate because we use both = and LIKE when constructing the filter and we restrict the filter value to be alphanumeric. And I think it should be better if we just name the cache field after the method whose results it is caching. So I'd rather stay with tablesWithParameterCache here.

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, otherwise looks good

@lxynov lxynov force-pushed the iceberg-table-listing branch 2 times, most recently from 37ee2dc to f5185b6 Compare September 3, 2019 22:31
@lxynov
Copy link
Member Author

lxynov commented Sep 3, 2019

@phd3 @findepi @electrum
Thanks for the review. I've addressed the comments.

@lxynov
Copy link
Member Author

lxynov commented Sep 10, 2019

I've rebased the changes against master to resolve conflicts. Do we want to merge this if everything is okay?

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I will merge when Travis passes
(please ping me if I don't)

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for splitting this up into multiple commits. Very easy to review and see the different changes.

Table listing in the Iceberg catalog should not include non-Iceberg tables.
@findepi findepi merged commit fc52c6a into trinodb:master Sep 11, 2019
@findepi
Copy link
Member

findepi commented Sep 11, 2019

Merged, thanks!

@findepi findepi mentioned this pull request Sep 11, 2019
6 tasks
@lxynov lxynov deleted the iceberg-table-listing branch September 11, 2019 07:29
@martint martint added this to the 319 milestone Sep 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

7 participants